{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we are going to show how to use [BGE-M3](https://huggingface.co/BAAI/bge-m3) with LlamaIndex.\n",
    "\n",
    "BGE-M3 is a hybrid multilingual retrieval model that supports over 100 languages and can handle input lengths of up to 8,192 tokens. The model can perform (i) dense retrieval, (ii) sparse retrieval, and (iii) multi-vector retrieval."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-indices-managed-bge-m3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating BGEM3Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Settings\n",
    "from llama_index.core import Document\n",
    "from llama_index.indices.managed.bge_m3 import BGEM3Index\n",
    "\n",
    "Settings.chunk_size = 8192"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's create some demo corpus\n",
    "sentences = [\n",
    "    \"BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.\",\n",
    "    \"BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document\",\n",
    "]\n",
    "documents = [Document(doc_id=i, text=s) for i, s in enumerate(sentences)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Indexing with BGE-M3 model\n",
    "index = BGEM3Index.from_documents(\n",
    "    documents,\n",
    "    weights_for_different_modes=[\n",
    "        0.4,\n",
    "        0.2,\n",
    "        0.4,\n",
    "    ],  # [dense_weight, sparse_weight, multi_vector_weight]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieve relevant documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = index.as_retriever()\n",
    "response = retriever.retrieve(\"What is BGE-M3?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## RAG with BGE-M3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What is BGE-M3?\")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
